Maximal Deviations of Incomplete U-statistics with Applications to Empirical Risk Sampling

نویسندگان

  • Stéphan Clémençon
  • Sylvain Robbiano
  • Jessica Tressou
چکیده

It is the goal of this paper to extend the Empirical Risk Minimization (ERM) paradigm, from a practical perspective, to the situation where a natural estimate of the risk is of the form of a K-sample U -statistics, as it is the case in the K-partite ranking problem for instance. Indeed, the numerical computation of the empirical risk is hardly feasible if not infeasible, even for moderate samples sizes. Precisely, it involves averaging O(n1K ) terms, when considering a U -statistic of degrees (d1, . . . , dK) based on samples of sizes proportional to n. We propose here to consider a drastically simpler Monte-Carlo version of the empirical risk based on O(n) terms solely, which can be viewed as an incomplete generalized U-statistic, and prove that, remarkably, the approximation stage does not damage the ERM procedure and yields a learning rate of order OP(1/ √ n). Beyond a theoretical analysis guaranteeing the validity of this approach, numerical experiments are displayed for illustrative

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling-up Empirical Risk Minimization: Optimization of Incomplete $U$-statistics

In a wide range of statistical learning problems such as ranking, clustering or metric learning among others, the risk is accurately estimated by U-statistics of degree d ≥ 1, i.e. functionals of the training data with low variance that take the form of averages over k-tuples. From a computational perspective, the calculation of such statistics is highly expensive even for a moderate sample siz...

متن کامل

SGD Algorithms based on Incomplete U-statistics: Large-Scale Minimization of Empirical Risk

In many learning problems, ranging from clustering to ranking through metric learning, empirical estimates of the risk functional consist of an average over tuples (e.g., pairs or triplets) of observations, rather than over individual observations. In this paper, we focus on how to best implement a stochastic approximation approach to solve such risk minimization problems. We argue that in the ...

متن کامل

Moderate Deviations for Functional U-processes

The moderate deviations principle is shown for the partial sums processes built on U-empirical measures of Polish space valued random variables and on U-statistics of real valued kernel functions. It is proved that in the non-degenerate case the conditions for the time xed principles suuce for the moderate deviations principle to carry over to the corresponding partial sums processes. Given a u...

متن کامل

A class of risk processes with reserve-dependent premium rate: sample path large deviations and importance sampling

Let (X(t)) be a risk process with reserve-dependent premium rate, delayed claims and initial capital u. Consider a class of risk processes {(Xε(t)) : ε > 0} derived from (X(t)) via scaling in a slow Markov walk sense, and let Ψε(u) be the corresponding ruin probability. In this paper we prove sample path large deviations for (X(t)) as ε → 0. As a consequence, we give exact asymptotics for log Ψ...

متن کامل

Delta Method in Large Deviations and Moderate Deviations for Estimators by Fuqing

The delta method is a popular and elementary tool for deriving limiting distributions of transformed statistics, while applications of asymptotic distributions do not allow one to obtain desirable accuracy of approximation for tail probabilities. The large and moderate deviation theory can achieve this goal. Motivated by the delta method in weak convergence, a general delta method in large devi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013